High Volume Data Ingester

This data collector requires a high-volume time series database to be deployed into which the system can store millisecond precision data.

Delimited csv files are taken and converted to time series measurements with various tags (attributes/meta information about the data items, stored as strings) and fields (data readings, can be String, Double Precision, Integer, or Boolean readings). Each row of csv data in the file should also have a timestamp column. The timestamp will be stored in the high-volume time series database in UTC. All data timestamps in the files are assumed to be supplied in UTC.

Linking points will be created with information that allows them to link to their timeseries measurement data in the external time series high volume database.

Example of a .csv file called site22_data.csv containing:

"02BFT01CT053||XQ01";"98.8086";"2019-05-06 11:20:51.790";"192";"LV TFR WNDG TEMP L3"

"02BFT01CT053||XQ01";"96.3021";"2019-05-06 13:50:30.350";"192";"LV TFR WNDG TEMP L3"

"02BHA00CE001Q||XQ01";"478.147";"2019-05-06 00:00:00.000";"195";"LV SWGR VOLT L1-L2"

"02EKD11CG101||XQ01";"23.6256";"2019-05-06 00:00:00.000";"195";"FG P REG POS"

This file would require configuration as:

FILE DELIMITER CHARACTER	;
HIGH VOLUME DATABASE	mydb
HTTP API	http://localhost:8086
HTTP USER	username
HTTP PASSWORD	thepasswordfortheuser
MEASUREMENT	mymeasurement
TIMESTAMP COLUMN FORMAT	yyyy-MM-dd HH:mm:ss.SSS
FILE COLUMN DEFINITIONS	tag=id,field=reading,ts,tag=signalquality,tag=description
ID COLUMN	id
CONTAINER NAME	High Volume Data
GET SITENAME FROM FILE	true
SITE COLUMN	site
USE SITENAME AS PART OF KEY	true

There should be no column header rows in the data file; it should be data-only.
The FILE COLUMN DEFINITIONS describe the CSV file format in terms of tags, fields, and one Timestamp [ts] column.
There can be several commas separated ID COLUMNS referenced. In the example, there is just one, but there could be id,description if both were required as a unique link to the data items in the external time series database.
Additionally, the site name can be taken from the filename and stored as in the high-volume database measurements.
The Site name column can be used as part of the ‘key’ in addition to keys in the ID COLUMN configuration to reference data in the high-volume database.
The site name is the first part of any supplied data up to the first _ character found in the file name. For site22_data.csv, the site name is site22.
A User and Password are required.
The created tags and field names must all be unique
Tags are always Strings. Fields, by default, will be treated as Double/Floating point values. An example of specifying field types in the File Column Definitions is given below.
Points will be created in a container called [CONTAINER NAME CONFIG] one level below the Org ROOT.
If Site Name is being fetched from file, then a container of this site name is created under [CONTAINER NAME CONFIG] first before points are created under this site container.
There must be one and only one timestamp column [ts in FILE COLUMN DEFINITIONS].
There must be at least one tag in the FILE COLUMN DEFINITIONS.
There must be at least one field in the FILE COLUMN DEFINITIONS.

The above example would create the following measurement in the database called mydb within it:

Measurement : mymeasurement

Results of Select * from mymeasurement:

And in Digital Twin Explorer:

And as linking attributes for example on the points, we get for example:

Advanced Specifying Types for Field Columns

To set a field type, you want a field to be held as specify the field as field=fieldname!type.

Where type can be:

string = will be held as string
boolean = will be held as Boolean
double = will be held as a floating point, (double precision)
long = will be held as an integer value (long precision)

For example:

MEASUREMENT	mymeasurement2

FILE COLUMN DEFINITIONS	tag=id,field=reading!long,ts,tag=signalquality,tag=description

Would produce in the database for the example file read into mymeasurement2.

Ignoring Columns in the CSV File

Columns can be ignored in the CSV file by just omitting a definition in the FILE COLUMN DEFINITIONS configuration.

For example, if I run our sample file in with the tag=signalquality removed:

MEASUREMENT	mymeasurement3

FILE COLUMN DEFINITIONS 	tag=id,field=reading,ts, ,tag=description

Would produce in the database for the example file read into mymeasurement3.

Notes

For further info on the formats that can be used in the TIMESTAMP COLUMN FORMAT, see https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatterBuilder.html#appendPattern-java.lang.String-.